Are rule-based syllabification methods adequate for languages with low syllabic complexity? the case of Italian
نویسندگان
چکیده
Syllabification information is a valuable component in speech synthesis systems. Linguistic rule-based methods have been assumed to be the best technique for determining the syllabification of unknown words. This has recently been shown to be incorrect for the English language where data-driven algorithms have been shown to outperform rule-based methods. It may be possible, however, that data-driven methods are only better for languages with complex syllable structures. In this paper, three rule-based automatic syllabification systems are compared and two data-driven (Syllabification by Analogy and the Look-Up Procedure) on a language with lower syllabic complexity Italian. Using a leave-one-out procedure on 44,720 words, the best data-driven algorithm (Syllabification by Analogy) achieved 97.70% word accuracy while the best rule-based method correctly syllabified 89.77% words. These results show that data-driven methods can also outperform rule-based methods on Italian syllabification, indicating that these may be the best approaches to the syllabification component of speech synthesis systems.
منابع مشابه
Syllabification rules versus data-driven methods in a language with low syllabic complexity: The case of Italian
Linguistic rules have been assumed to be the best technique for determining the syllabification of unknown words. This has recently been challenged for the English language where data-driven algorithms have been shown to outperform rule-based methods. It may be possible, however, that data-driven methods are only better for languages with complex syllable structures. In this study, three rule-b...
متن کاملImdlawn Tashlhiyt Berber Syllabification is Quantifier-Free∗
Imdlawn Tashlhiyt Berber (ITB) is unusual due to its tolerance of non-vocalic syllabic nuclei. Rule-based and constraint-based accounts of ITB syllabification do not directly address the question of how complex the process is. Model theory and formal logic allow for comparison of complexity across different theories of phonology by identifying the computational power (or expressivity) of lingui...
متن کاملAutomatic syllabification for danish text-to-speech systems
In this paper, a rule-based automatic syllabifier for Danish is described using the Maximal Onset Principle. Prior success rates of rule-based methods applied to Portuguese and Catalan syllabification modules were on the basis of this work. The system was implemented and tested using a very small set of rules. The results gave rise to 96.9% and 98.7% of word accuracy rate, contrary to our initi...
متن کاملInvestigating the missing data effect on credit scoring rule based models: The case of an Iranian bank
Credit risk management is a process in which banks estimate probability of default (PD) for each loan applicant. Data sets of previous loan applicants are built by gathering their data, and these internal data sets are usually completed using external credit bureau’s data and finally used for estimating PD in banks. There is also a continuous interest for bank to use rule based classifiers to b...
متن کاملSylli: Automatic Phonological Syllabification for Italian
We will present a complete syllabifier for Italian (Sylli), that is based on phonological principles, flexible and easy to adapt for other uses, alphabets and languages. Crucial concepts regarding syllabification principles in modern phonological theory will be discussed (§1.1); specific issues concerning Italian syllabification will then be summarised (§1.2) and an overview of the available au...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007